malicious use
A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
Wang, Huandong, Fu, Wenjie, Tang, Yingzhou, Chen, Zhilong, Huang, Yuxi, Piao, Jinghua, Gao, Chen, Xu, Fengli, Jiang, Tao, Li, Yong
While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast to previous surveys that focus on a single dimension of responsible LLMs, this survey presents a unified framework that encompasses these diverse dimensions, providing a comprehensive view of enhancing LLMs to better serve real-world applications.
- Europe > United Kingdom (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Singapore (0.04)
- (8 more...)
- Research Report (1.00)
- Overview (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- (2 more...)
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
Brundage, Miles, Avin, Shahar, Clark, Jack, Toner, Helen, Eckersley, Peter, Garfinkel, Ben, Dafoe, Allan, Scharre, Paul, Zeitzoff, Thomas, Filar, Bobby, Anderson, Hyrum, Roff, Heather, Allen, Gregory C., Steinhardt, Jacob, Flynn, Carrick, hÉigeartaigh, Seán Ó, Beard, SJ, Belfield, Haydn, Farquhar, Sebastian, Lyle, Clare, Crootof, Rebecca, Evans, Owain, Page, Michael, Bryson, Joanna, Yampolskiy, Roman, Amodei, Dario
This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- Asia > China (0.14)
- Asia > Russia (0.14)
- (25 more...)
- Overview (1.00)
- Research Report > New Finding (0.67)
- Instructional Material > Course Syllabus & Notes (0.45)
- Transportation > Air (1.00)
- Media > News (1.00)
- Leisure & Entertainment > Games (1.00)
- (13 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
- (6 more...)
International Scientific Report on the Safety of Advanced AI (Interim Report)
Bengio, Yoshua, Mindermann, Sören, Privitera, Daniel, Besiroglu, Tamay, Bommasani, Rishi, Casper, Stephen, Choi, Yejin, Goldfarb, Danielle, Heidari, Hoda, Khalatbari, Leila, Longpre, Shayne, Mavroudis, Vasilios, Mazeika, Mantas, Ng, Kwan Yee, Okolo, Chinasa T., Raji, Deborah, Skeadas, Theodora, Tramèr, Florian, Adekanmbi, Bayo, Christiano, Paul, Dalrymple, David, Dietterich, Thomas G., Felten, Edward, Fung, Pascale, Gourinchas, Pierre-Olivier, Jennings, Nick, Krause, Andreas, Liang, Percy, Ludermir, Teresa, Marda, Vidushi, Margetts, Helen, McDermid, John A., Narayanan, Arvind, Nelson, Alondra, Oh, Alice, Ramchurn, Gopal, Russell, Stuart, Schaake, Marietje, Song, Dawn, Soto, Alvaro, Tiedrich, Lee, Varoquaux, Gaël, Yao, Andrew, Zhang, Ya-Qin
I am honoured to be chairing the delivery of the inaugural International Scientific Report on Advanced AI Safety. I am proud to publish this interim report which is the culmination of huge efforts by many experts over the six months since the work was commissioned at the Bletchley Park AI Safety Summit in November 2023. We know that advanced AI is developing very rapidly, and that there is considerable uncertainty over how these advanced AI systems might affect how we live and work in the future. AI has tremendous potential to change our lives for the better, but it also poses risks of harm. That is why having this thorough analysis of the available scientific literature and expert opinion is essential. The more we know, the better equipped we are to shape our collective destiny.
- Europe > United Kingdom > England > Buckinghamshire > Milton Keynes (0.24)
- North America > United States > California > Santa Clara County > Palo Alto (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- (51 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- (2 more...)
- Transportation (1.00)
- Media > News (1.00)
- Leisure & Entertainment (1.00)
- (13 more...)
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Li, Nathaniel, Pan, Alexander, Gopal, Anjali, Yue, Summer, Berrios, Daniel, Gatti, Alice, Li, Justin D., Dombrowski, Ann-Kathrin, Goel, Shashwat, Phan, Long, Mukobi, Gabriel, Helm-Burger, Nathan, Lababidi, Rassin, Justen, Lennart, Liu, Andrew B., Chen, Michael, Barrass, Isabelle, Zhang, Oliver, Zhu, Xiaoyuan, Tamirisa, Rishub, Bharathi, Bhrugu, Khoja, Adam, Zhao, Zhenqi, Herbert-Voss, Ariel, Breuer, Cort B., Marks, Samuel, Patel, Oam, Zou, Andy, Mazeika, Mantas, Wang, Zifan, Oswal, Palash, Lin, Weiran, Hunt, Adam A., Tienken-Harder, Justin, Shih, Kevin Y., Talley, Kemper, Guan, John, Kaplan, Russell, Steneker, Ian, Campbell, David, Jokubaitis, Brad, Levinson, Alex, Wang, Jean, Qian, William, Karmakar, Kallol Krishna, Basart, Steven, Fitz, Stephen, Levine, Mindy, Kumaraguru, Ponnurangam, Tupakula, Uday, Varadharajan, Vijay, Wang, Ruoyu, Shoshitaishvili, Yan, Ba, Jimmy, Esvelt, Kevin M., Wang, Alexandr, Hendrycks, Dan
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts (0.04)
- (6 more...)
Risk and Response in Large Language Models: Evaluating Key Threat Categories
Harandizadeh, Bahareh, Salinas, Abel, Morstatter, Fred
This paper explores the pressing issue of risk assessment in Large Language Models (LLMs) as they become increasingly prevalent in various applications. Focusing on how reward models, which are designed to fine-tune pretrained LLMs to align with human values, perceive and categorize different types of risks, we delve into the challenges posed by the subjective nature of preference-based training data. By utilizing the Anthropic Red-team dataset, we analyze major risk categories, including Information Hazards, Malicious Uses, and Discrimination/Hateful content. Our findings indicate that LLMs tend to consider Information Hazards less harmful, a finding confirmed by a specially developed regression model. Additionally, our analysis shows that LLMs respond less stringently to Information Hazards compared to other risks. The study further reveals a significant vulnerability of LLMs to jailbreaking attacks in Information Hazard scenarios, highlighting a critical security concern in LLM risk assessment and emphasizing the need for improved AI safety measures.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > Russia (0.14)
- Europe > Russia (0.04)
- (3 more...)
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Chan, Alan, Bucknall, Ben, Bradley, Herbie, Krueger, David
Public release of the weights of pretrained foundation models, otherwise known as downloadable access [Solaiman, 2023], enables fine-tuning without the prohibitive expense of pretraining. Our work argues that increasingly accessible fine-tuning of downloadable models may increase hazards. First, we highlight research to improve the accessibility of fine-tuning. We split our discussion into research that A) reduces the computational cost of fine-tuning and B) improves the ability to share that cost across more actors. Second, we argue that increasingly accessible finetuning methods may increase hazard through facilitating malicious use and making oversight of models with potentially dangerous capabilities more difficult. Third, we discuss potential mitigatory measures, as well as benefits of more accessible fine-tuning. Given substantial remaining uncertainty about hazards, we conclude by emphasizing the urgent need for the development of mitigations.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
An Overview of Catastrophic AI Risks
Hendrycks, Dan, Mazeika, Mantas, Woodside, Thomas
Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose catastrophic risks. Although numerous risks have been detailed separately, there is a pressing need for a systematic discussion and illustration of the potential dangers to better inform efforts to mitigate them. This paper provides an overview of the main sources of catastrophic AI risks, which we organize into four categories: malicious use, in which individuals or groups intentionally use AIs to cause harm; AI race, in which competitive environments compel actors to deploy unsafe AIs or cede control to AIs; organizational risks, highlighting how human factors and complex systems can increase the chances of catastrophic accidents; and rogue AIs, describing the inherent difficulty in controlling agents far more intelligent than humans. For each category of risk, we describe specific hazards, present illustrative stories, envision ideal scenarios, and propose practical suggestions for mitigating these dangers. Our goal is to foster a comprehensive understanding of these risks and inspire collective and proactive efforts to ensure that AIs are developed and deployed in a safe manner. Ultimately, we hope this will allow us to realize the benefits of this powerful technology while minimizing the potential for catastrophic outcomes.
- Asia > Russia (0.92)
- North America > United States > District of Columbia > Washington (0.14)
- Europe > Austria > Vienna (0.14)
- (22 more...)
- Research Report (1.00)
- Overview (1.00)
- Transportation (1.00)
- Media (1.00)
- Leisure & Entertainment > Games (1.00)
- (18 more...)
A tsunami of AI misinformation will shape next year's knife-edge elections John Naughton
It looks like 2024 will be a pivotal year for democracy. There are elections taking place all over the free world – in South Africa, Ghana, Tunisia, Mexico, India, Austria, Belgium, Lithuania, Moldova and Slovakia, to name just a few. Of these, the last may be the most pivotal because: Donald Trump is a racing certainty to be the Republican candidate; a significant segment of the voting population seems to believe that the 2020 election was "stolen"; and the Democrats are, well… underwhelming. The consequences of a Trump victory would be epochal. It would mean the end (for the time being, at least) of the US experiment with democracy, because the people behind Trump have been assiduously making what the normally sober Economist describes as "meticulous, ruthless preparations" for his second, vengeful term.
- North America > United States (0.90)
- North America > Mexico (0.25)
- Europe > Slovakia (0.25)
- (10 more...)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (0.51)
- Health & Medicine > Therapeutic Area > Immunology (0.39)
Malicious use of AI could cause 'unimaginable' damage, says UN boss
Malicious use of artificial intelligence systems could cause a "horrific" amount of death and destruction, the UN secretary general has said, calling for a new UN body to tackle the threats posed by the technology. António Guterres said harmful use of AI for terrorist, criminal or state purposes could also cause "deep psychological damage", and he said AI-enabled cyber-attacks were already targeting UN peacekeeping and humanitarian operations. "The malicious use of AI systems for terrorist, criminal or state purposes could cause horrific levels of death and destruction, widespread trauma and deep psychological damage on an unimaginable scale," Guterres said. Speaking at the first UN security council session on AI, he said the advent of generative AI – the term for AI tools such as ChatGPT that produce convincing text, image and voice from human prompts – could be a defining moment for disinformation and hate speech and add a "new dimension" to the manipulation of human behaviour. Guterres called for the creation of a new UN entity along the lines of the Intergovernmental Panel on Climate Change to tackle the risks.
- Government > Military (0.74)
- Government > Regional Government > Europe Government > United Kingdom Government (0.34)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Applied AI (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.81)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)
The AI Apocalypse: Will AI Take Over the World?
Welcome to the future where robots rule the world and humans are relegated to the sidelines. Sounds like a science fiction movie? The rapid advancement of AI technology has sparked a heated debate about the potential consequences of AI and its impact on the future of humanity. But one thing is sure. AI is not just a futuristic fantasy; it's happening right now.